violence detection
Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
Jung, Seoik, Song, Taekyung, Lee, Yangro, Lee, Sungjun
Abstract--This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divides videos into 1-2 second clips and applies Large Language Model (LLM)-based auto-caption labeling to construct fine-grained datasets. Each short clip fully utilizes all frames to preserve temporal continuity, enabling precise recognition of rapid violent events. Experiments demonstrate that the proposed method achieves 95.25% accuracy on RWF-2000 and significantly improves performance on long videos (UCF-Crime: 83.25%), confirming its strong generalization and real-time applicability in intelligent surveillance systems. Recently, video-based violence and abnormal behavior detection has been gaining attention as an essential core technology in fields such as public safety, smart cities, and intelligent surveillance [1].
Federated Learning for Video Violence Detection: Complementary Roles of Lightweight CNNs and Vision-Language Models for Energy-Efficient Use
Thuau, Sébastien, Haidar, Siba, Chelouah, Rachid
Deep learning-based video surveillance increasingly demands privacy-preserving architectures with low computational and environmental overhead. Federated learning preserves privacy but deploying large vision-language models (VLMs) introduces major energy and sustainability challenges. We compare three strategies for federated violence detection under realistic non-IID splits on the RWF-2000 and RLVS datasets: zero-shot inference with pretrained VLMs, LoRA-based fine-tuning of LLaVA-NeXT-Video-7B, and personalized federated learning of a 65.8M-parameter 3D CNN. All methods exceed 90% accuracy in binary violence detection. The 3D CNN achieves superior calibration (ROC AUC 92.59%) at roughly half the energy cost (240 Wh vs. 570 Wh) of federated LoRA, while VLMs provide richer multimodal reasoning. Hierarchical category grouping (based on semantic similarity and class exclusion) boosts VLM multiclass accuracy from 65.31% to 81% on the UCF-Crime dataset. To our knowledge, this is the first comparative simulation study of LoRA-tuned VLMs and personalized CNNs for federated violence detection, with explicit energy and CO2e quantification. Our results inform hybrid deployment strategies that default to efficient CNNs for routine inference and selectively engage VLMs for complex contextual reasoning.
- Europe > France > Île-de-France > Paris > Paris (0.05)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
Frugal Federated Learning for Violence Detection: A Comparison of LoRA-Tuned VLMs and Personalized CNNs
Thuau, Sébastien, Haidar, Siba, Bajracharya, Ayush, Chelouah, Rachid
We examine frugal federated learning approaches to violence detection by comparing two complementary strategies: (i) zero-shot and federated fine-tuning of vision-language models (VLMs), and (ii) personalized training of a compact 3D convolutional neural network (CNN3D). Using LLaVA-7B and a 65.8M parameter CNN3D as representative cases, we evaluate accuracy, calibration, and energy usage under realistic non-IID settings. Both approaches exceed 90% accuracy. CNN3D slightly outperforms Low-Rank Adaptation(LoRA)-tuned VLMs in ROC AUC and log loss, while using less energy. VLMs remain favorable for contextual reasoning and multimodal inference. We quantify energy and CO$_2$ emissions across training and inference, and analyze sustainability trade-offs for deployment. To our knowledge, this is the first comparative study of LoRA-tuned vision-language models and personalized CNNs for federated violence detection, with an emphasis on energy efficiency and environmental metrics. These findings support a hybrid model: lightweight CNNs for routine classification, with selective VLM activation for complex or descriptive scenarios. The resulting framework offers a reproducible baseline for responsible, resource-aware AI in video surveillance, with extensions toward real-time, multimodal, and lifecycle-aware systems.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
- Energy (1.00)
- Information Technology > Security & Privacy (0.68)
- Asia > China > Chongqing Province > Chongqing (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection
Senadeera, Damith Chamalke, Yang, Xiaoyun, Li, Shibo, Awais, Muhammad, Kollias, Dimitrios, Slabaugh, Gregory
The rapid proliferation of surveillance cameras has increased the demand for automated violence detection. While CNNs and Transformers have shown success in extracting spatio-temporal features, they struggle with long-term dependencies and computational efficiency. W e propose Dual Branch VideoMamba with Gated Class T oken Fusion (GCTF), an efficient architecture combining a dual-branch design and a state-space model (SSM) backbone where one branch captures spatial features, while the other focuses on temporal dynamics. The model performs continuous fusion via a gating mechanism between the branches to enhance the model's ability to detect violent activities even in challenging surveillance scenarios. W e also present a new benchmark by merging RWF-2000, RLVS, SURV and VioPeru datasets in video violence detection, ensuring strict separation between training and testing sets. Experimental results demonstrate that our model achieves state-of-the-art performance on this benchmark and also on DVD dataset which is another novel dataset on video violence detection, offering an optimal balance between accuracy and computational efficiency, demonstrating the promise of SSMs for scalable, near real-time surveillance violence detection.
- North America > United States (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- South America > Peru (0.04)
- (2 more...)
- Asia > China (0.14)
- Europe > Netherlands (0.14)
Intelligent Image Sensing for Crime Analysis: A ML Approach towards Enhanced Violence Detection and Investigation
Dutta, Aritra, Boral, Pushpita, Suseela, G
The increasing global crime rate, coupled with substantial human and property losses, highlights the limitations of traditional surveillance methods in promptly detecting diverse and unexpected acts of violence. Addressing this pressing need for automatic violence detection, we leverage Machine Learning to detect and categorize violent events in video streams. This paper introduces a comprehensive framework for violence detection and classification, employing Supervised Learning for both binary and multi-class violence classification. The detection model relies on 3D Convolutional Neural Networks, while the classification model utilizes the separable convolutional 3D model for feature extraction and bidirectional LSTM for temporal processing. Training is conducted on a diverse customized datasets with frame-level annotations, incorporating videos from surveillance cameras, human recordings, hockey fight, sohas and wvd dataset across various platforms. Additionally, a camera module integrated with raspberry pi is used to capture live video feed, which is sent to the ML model for processing. Thus, demonstrating improved performance in terms of computational resource efficiency and accuracy.
- Commercial Services & Supplies > Security & Alarm Services (0.89)
- Leisure & Entertainment > Sports > Hockey (0.36)
Cross-Platform Violence Detection on Social Media: A Dataset and Analysis
Chen, Celia, Beland, Scotty, Burghardt, Ingo, Byczek, Jill, Conway, William J., Cotugno, Eric, Davre, Sadaf, Fletcher, Megan, Gnanasekaran, Rajesh Kumar, Hamilton, Kristin, Harbert, Marilyn, Heustis, Jordan, Jha, Tanaya, Klein, Emily, Kramer, Hayden, Leitch, Alex, Perkins, Jessica, Sherman, Casi, Sterrn, Celia, Stevens, Logan, Zarrella, Rebecca, Golbeck, Jennifer
Violent threats remain a significant problem across social media platforms. Useful, high-quality data facilitates research into the understanding and detection of malicious content, including violence. In this paper, we introduce a cross-platform dataset of 30,000 posts hand-coded for violent threats and sub-types of violence, including political and sexual violence. To evaluate the signal present in this dataset, we perform a machine learning analysis with an existing dataset of violent comments from YouTube. We find that, despite originating from different platforms and using different coding criteria, we achieve high classification accuracy both by training on one dataset and testing on the other, and in a merged dataset condition. These results have implications for content-classification strategies and for understanding violent content across social media.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > New Jersey > Middlesex County > New Brunswick (0.05)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- (2 more...)
Exploring Personalized Federated Learning Architectures for Violence Detection in Surveillance Videos
Kassir, Mohammad, Haidar, Siba, Yaacoub, Antoun
The challenge of detecting violent incidents in urban surveillance systems is compounded by the voluminous and diverse nature of video data. This paper presents a targeted approach using Personalized Federated Learning (PFL) to address these issues, specifically employing the Federated Learning with Personalization Layers method within the Flower framework. Our methodology adapts learning models to the unique data characteristics of each surveillance node, effectively managing the heterogeneous and non-IID nature of surveillance video data. Through rigorous experiments conducted on balanced and imbalanced datasets, our PFL models demonstrated enhanced accuracy and efficiency, achieving up to 99.3% accuracy. This study underscores the potential of PFL to significantly improve the scalability and effectiveness of surveillance systems, offering a robust, privacy-preserving solution for violence detection in complex urban environments.
- Europe > France > Île-de-France > Paris > Paris (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT
Jiang, Wen-Dong, Chang, Chih-Yung, Roy, Diptendu Sinha
Recently, violence detection systems developed using unified multimodal models have achieved significant success and attracted widespread attention. However, most of these systems face two critical challenges: the lack of interpretability as black-box models and limited functionality, offering only classification or retrieval capabilities. To address these challenges, this paper proposes a novel interpretable violence detection system, termed the Three-in-One (TIO) System. The TIO system integrates knowledge graphs (KG) and graph attention networks (GAT) to provide three core functionalities: detection, retrieval, and explanation. Specifically, the system processes each video frame along with text descriptions generated by a large language model (LLM) for videos containing potential violent behavior. It employs ImageBind to generate high-dimensional embeddings for constructing a knowledge graph, uses GAT for reasoning, and applies lightweight time series modules to extract video embedding features. The final step connects a classifier and retriever for multi-functional outputs. The interpretability of KG enables the system to verify the reasoning process behind each output. Additionally, the paper introduces several lightweight methods to reduce the resource consumption of the TIO system and enhance its efficiency. Extensive experiments conducted on the XD-Violence and UCF-Crime datasets validate the effectiveness of the proposed system. A case study further reveals an intriguing phenomenon: as the number of bystanders increases, the occurrence of violent behavior tends to decrease.
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Asia > India > Meghalaya > Shillong (0.04)
- North America > United States > West Virginia (0.04)